We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.
translated by 谷歌翻译
我们研究了情节块MDP中模型估计和无奖励学习的问题。在这些MDP中,决策者可以访问少数潜在状态产生的丰富观察或上下文。我们首先对基于固定行为策略生成的数据估算潜在状态解码功能(从观测到潜在状态的映射)感兴趣。我们在估计此功能的错误率上得出了信息理论的下限,并提出了接近此基本限制的算法。反过来,我们的算法还提供了MDP的所有组件的估计值。然后,我们研究在无奖励框架中学习近乎最佳政策的问题。根据我们有效的模型估计算法,我们表明我们可以以最佳的速度推断出策略(随着收集样品的数量增长大)的最佳策略。有趣的是,我们的分析提供了必要和充分的条件,在这些条件下,利用块结构可以改善样本复杂性,以识别近乎最佳的策略。当满足这些条件时,Minimax无奖励设置中的样本复杂性将通过乘法因子$ n $提高,其中$ n $是可能的上下文数量。
translated by 谷歌翻译
本文定义了公平的主要成分分析(PCA),从而最大限度地减少不同受保护类的维度减少条件分布之间的最大平均差异(MMD)。MMD的掺入自然导致具有良好统计性质的公平性的精确和易易易诊的数学制剂。我们制定公平PCA,经过MMD限制的公平PCA,作为Stiefel歧管的非凸优化,并使用具有平滑(REPMS; LIU和BOUMAL,2019)的Riemannian精确惩罚方法来解决它。重要的是,我们提供当地的最优性保证,并明确显示每个超参数在实际设置中的理论效果,扩展了先前的结果。基于合成和UCI数据集的实验比较表明,我们的方法优于现有工作的差异,公平,公平和运行时。
translated by 谷歌翻译
在本文中,我们通过变异自动编码器(VAE)研究了基于弦的分子生成的问题,这些问题已经为人工智能的各种任务提供了一种流行的生成方法。我们提出了一个简单而有效的想法,以提高VAE的任务绩效。我们的主要思想是在共享单个编码器时维护多个解码器,即它是一种合奏技术。在这里,我们首先发现,由于合奏解码器的偏见在其自动回归推理下严重增加,因此每个解码器都可能没有有效。为了维持集合模型的较小偏见和差异,我们提出的技术是两倍:(a)为每个解码器采样不同的潜在变量(从共享编码器提供的估计平均值和差异)来鼓励解码器的多元化特征(b)在培训期间使用协作损失,以控制使用不同的潜在变量的解码器的汇总质量。在我们的实验中,提出的VAE模型特别表现出色,可从域外分布产生样品。
translated by 谷歌翻译
大多数物体检测方法通过使用非最大抑制(NMS)及其改进版本,如Soft-NMS获取对象,这是一个很长的历史记录,以删除冗余边界框。我们从三个方面挑战那些基于NMS的方法:1)具有最高置信度值的边界框可能不是具有与地面真理盒最大的重叠的真正积极。 2)冗余盒不仅需要抑制,而且对于那些真正的阳性也需要置信度。 3)不需要置信度值排序候选盒,以便可以实现完整的并行性。在本文中,通过信仰传播(BP)的启发,我们提出了置信沟集团(CP簇)来替换基于NMS的方法,这是完全并行化的,以及精度更好。在CP-Cluster中,我们借用BP的消息传递机制来惩罚冗余框,并以迭代方式同时增强真正的阳性直到收敛。我们通过将其应用于各种主流探测器,例如FasterRCNN,SSD,FCO,YOLOV3,YOLOV5,CENTERENET等实验,验证了CP-Cluster的有效性。在MS COCO上的实验表明,我们的插头和游戏方法没有再培训探测器,都能够稳步与基于NMS的方法相比,将分别从0.2到1.9的透明边距提高所有最先进模型的平均地图。源代码在https://github.com/shenyi0220/cp-cluster中获得
translated by 谷歌翻译
我们在一个或多个镜头中介绍FacialFilmroll,一种用于空间和时间一致地编辑面的解决方案。我们建立在未包装马赛克[Rav-Acha等。2008年]通过专门谈谈。我们利用最近的技术适应单眼视频的3D面部模型(i)提高了Edition的Mosaic的质量,并允许从一个拍摄的射击自动转移到同一演员的其他镜头。我们解释了FacialFilmroll如何集成在生产后设施中。最后,我们在高分辨率视频上使用FacialFilmroll提供视频编辑结果。
translated by 谷歌翻译
一般照明条件中单眼图像的强大面部重建是具有挑战性的。用于使用微弱渲染的深度神经网络编码器结合的方法打开了几何,照明和反射的非常快速的单眼重建的路径。它们也可以通过自我监督的方式培训,以增加鲁棒性和更好的泛化。然而,基于光栅化的图像形成模型以及底层场景参数化,将它们限制在Lambertian的反射率和差的形状细节中。最近,在基于经典优化的框架内引入了用于单眼脸部重建的射线跟踪,并实现最先进的结果。然而,基于优化的方法本质上很慢,缺乏鲁棒性。在本文中,我们在上述方法上建立了我们的工作,并提出了一种新的方法,大大提高了一般场景中的重建质量和鲁棒性。我们通过将CNN编码器与可分散的射线示踪剂组合来实现这一点,这使得我们能够将重建基于更高级的个性化漫射和镜面,更复杂的照明模型和自阴影的合理表示。这使得即使在难以照明的场景中,也可以在重建的形状,外观和照明中进行大跃进。通过一致的面部属性重建,我们的方法导致实际应用,例如致密和自阴影去除。与最先进的方法相比,我们的结果表明了提高了方法的准确性和有效性。
translated by 谷歌翻译
The 3D-aware image synthesis focuses on conserving spatial consistency besides generating high-resolution images with fine details. Recently, Neural Radiance Field (NeRF) has been introduced for synthesizing novel views with low computational cost and superior performance. While several works investigate a generative NeRF and show remarkable achievement, they cannot handle conditional and continuous feature manipulation in the generation procedure. In this work, we introduce a novel model, called Class-Continuous Conditional Generative NeRF ($\text{C}^{3}$G-NeRF), which can synthesize conditionally manipulated photorealistic 3D-consistent images by projecting conditional features to the generator and the discriminator. The proposed $\text{C}^{3}$G-NeRF is evaluated with three image datasets, AFHQ, CelebA, and Cars. As a result, our model shows strong 3D-consistency with fine details and smooth interpolation in conditional feature manipulation. For instance, $\text{C}^{3}$G-NeRF exhibits a Fr\'echet Inception Distance (FID) of 7.64 in 3D-aware face image synthesis with a $\text{128}^{2}$ resolution. Additionally, we provide FIDs of generated 3D-aware images of each class of the datasets as it is possible to synthesize class-conditional images with $\text{C}^{3}$G-NeRF.
translated by 谷歌翻译
Cellular automata (CA) captivate researchers due to teh emergent, complex individualized behavior that simple global rules of interaction enact. Recent advances in the field have combined CA with convolutional neural networks to achieve self-regenerating images. This new branch of CA is called neural cellular automata [1]. The goal of this project is to use the idea of idea of neural cellular automata to grow prediction machines. We place many different convolutional neural networks in a grid. Each conv net cell outputs a prediction of what the next state will be, and minimizes predictive error. Cells received their neighbors' colors and fitnesses as input. Each cell's fitness score described how accurate its predictions were. Cells could also move to explore their environment and some stochasticity was applied to movement.
translated by 谷歌翻译
There is a dramatic shortage of skilled labor for modern vineyards. The Vinum project is developing a mobile robotic solution to autonomously navigate through vineyards for winter grapevine pruning. This necessitates an autonomous navigation stack for the robot pruning a vineyard. The Vinum project is using the quadruped robot HyQReal. This paper introduces an architecture for a quadruped robot to autonomously move through a vineyard by identifying and approaching grapevines for pruning. The higher level control is a state machine switching between searching for destination positions, autonomously navigating towards those locations, and stopping for the robot to complete a task. The destination points are determined by identifying grapevine trunks using instance segmentation from a Mask Region-Based Convolutional Neural Network (Mask-RCNN). These detections are sent through a filter to avoid redundancy and remove noisy detections. The combination of these features is the basis for the proposed architecture.
translated by 谷歌翻译